Technical  Research  Report 


Exact  Subpixel  Motion  Estimation  in 
DCT  Domain 


by  U-V.  Koc  and  K.J.R.  Liu 


T.R.  96-2 


IGUR] 


Sponsored  by 

the  National  Science  Foundation 
Engineering  Research  Center  Program , 
the  University  of  Maryland, 

Harvard  University, 
and  Industry 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

1996 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-1996  to  00-00-1996 

4.  TITLE  AND  SUBTITLE 

5a.  CONTRACT  NUMBER 

Exact  Subpixel  Motion  Estimation  in  DCT  Domain 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Department  of  Electrical  Engineering, Institute  for  Systems 

Research, University  of  Maryland, College  Park, MD, 20742 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

see  report 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 
ABSTRACT 

18.  NUMBER 
OF  PAGES 

32 

19a.  NAME  OF 
RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Exact  Subpixel  Motion  Estimation  In  DCT  Domain 


Ut-Va  Koc  and  K.  J.  Ray  Liu 


Electrical  Engineering  Department  and  Institute  for  Systems  Research 
University  of  Maryland  at  College  Park 
College  Park,  Maryland  20742 
kocOeng . umd . edu  and  kjrliuSeng . umd . edu 


ABSTRACT 

Currently  existing  subpixel  motion  estimation  algorithms  require  interpolation  of  inter-pixel  values 
which  undesirably  increases  the  overall  complexity  and  data  flow  and  deteriorates  estimation  accuracy. 
In  this  paper,  we  develop  DCT-based  techniques  to  estimate  subpel  motion  at  different  desired  subpel 
levels  of  accuracy  in  DCT  domain  without  interpolation.  We  show  that  subpixel  motion  information 
is  preserved  in  the  DCT  of  a  shifted  signal  under  some  condition  in  the  form  of  pseudo  phases  and  es¬ 
tablish  subpel  sinusoidal  orthogonal  principles  to  extract  this  information.  Though  applicable  to  other 
areas  as  well,  the  resulted  algorithms  from  these  techniques  for  video  coding  are  flexible  and  scalable 
in  terms  of  estimation  accuracy  with  very  low  computational  complexity  0(N2)  compared  to  ()(N4) 
for  Full  Search  Block  Matching  Approach  and  its  subpixel  versions.  Above  all,  motion  estimation  in 
DCT  domain  instead  of  spatial  domain  simplifies  the  conventional  hybrid  DCT-based  video  coder,  es¬ 
pecially  the  heavily  loaded  feedback  loop  in  the  conventional  design,  resulting  in  a  fully  DCT-based 
high-throughput  video  codec.  In  addition,  the  computation  of  pseudo  phases  is  local  and  thus  a  highly 
parallel  architecture  is  feasible  for  the  DCT-based  algorithms.  Finally  simulation  on  video  sequences  of 
different  characteristics  shows  comparable  performance  of  the  proposed  algorithms  to  block  matching 
approaches. 

Keywords:  motion  estimation,  subpixel  accuracy,  video  coding,  video  compression,  shift  measurement 
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I.  Introduction 


Accurate  estimation  of  displacement  or  location  of  a  signal  or  image  is  important  in  many  applica¬ 
tions  of  signal  and  image  processing  such  as  time  delay  estimation  [21],  target  tracking  [35],  non-contact 
measurement  [40],  [2],  remote  sensing  [4],  [11],  computer  vision  [1],  image  registration  [8],  [38],  and  so 
on.  In  video  coding,  motion  estimation  is  proved  to  be  very  useful  for  reduction  of  temporal  redundancy. 
Therefore,  a  number  of  motion  estimation  algorithms  have  been  devised  solely  for  video  coding  [29],  [10] 
and  numerous  VLSI  architectures  have  been  designed  for  practical  video  applications  [33].  To  further 
improve  the  compression  rate,  motion  estimation  with  subpixel  accuracy  is  essential  because  movements 
in  a  video  sequence  are  not  necessarily  multiples  of  the  sampling  grid  distance  in  the  rectangular  sam¬ 
pling  grid  of  a  camera.  It  is  shown  that  significant  improvement  of  coding  gain  can  be  obtained  with 
motion  estimation  of  half  pixel  or  finer  accuracy  [16].  Further  investigation  reveals  that  the  temporal 
prediction  error  variance  is  generally  decreased  by  subpixel  motion  compensation  but  beyond  a  certain 
“critical  accuracy”  the  possibility  of  further  improving  prediction  by  more  accurate  motion  compensa¬ 
tion  is  small  [13].  As  suggested  in  [16],  [12],  motion  compensation  with  1/4-pel  accuracy  is  sufficiently 
accurate  for  broadcast  TV  signals,  but  for  videophone  signals,  lialf-pel  accuracy  is  good  enough.  As 
a  result,  motion  compensation  with  half-pel  accuracy  is  recommended  in  MPEG  standards  [27],  [28]. 
Implementations  of  half-pel  motion  estimation  have  started  to  be  realized  [39],  [3],  [6]. 

Many  subpixel  motion  estimation  schemes  have  been  proposed  over  the  years  [1],  [29],  [10].  The  most 
commonly  used  spatial-domain  fractional-pel  motion  estimation  algorithms  such  as  the  block  matching- 
approach  [26],  [12],  [9],  the  pel-recursive  approach  [30],  [31],  require  interpolation  of  images  through 
bilinear,  Lagrange,  or  other  interpolation  methods  [34].  However,  interpolation  not  only  increases  the 
complexity  and  data  flow  of  a  coder  but  also  may  adversely  affect  the  accuracy  of  motion  estimates 
from  the  interpolated  images  [12].  It  is  more  desirable  that  subpixel  accuracy  of  motion  estimates 
can  be  obtained  without  interpolating  the  images.  In  the  category  of  frequency-domain  methods,  the 
phase  correlation  technique  [37],  [41],  [22]  is  reported  to  provide  accurate  estimates  without  inter-pixel 
interpolation  but  is  based  on  Fast  Fourier  Transform  (FFT),  which  is  incompatible  with  DCT-based 
video  coding  standards  and  requires  a  large  search  window  at  a  high  computational  cost.  Other  FFT- 
based  approaches  such  as  in  [17],  [20]  also  have  similar  drawbacks. 

Due  to  the  fact  that  the  motion  compensated  DCT-based  hybrid  approach  is  the  backbone  of  several 
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(a)  Hybrid  Coder  (b)  Fully  DCT-Based  Coder 

Fig.  1.  Coder  structures:  (a)  Commonly  used  motion-compensated  DCT  hybrid  coder  performs  motion  estimation 
in  the  spatial  domain,  (b)  Fully  DCT-based  coder  estimates  motion  in  the  transform  domain. 


international  video  coding  standards  such  as  CCITT  H.261  [14],  MPEG1  [27],  MPEG2  [28],  and  the 
emerging  HDTV  [5]  and  H.263  [15]  standards,  it  is  more  desirable  to  estimate  motion  with  fractional- 
pel  accuracy  without  any  inter-pixel  interpolation  at  a  low  computational  cost  in  the  DCT  domain  so 
that  seamless  integration  of  the  motion  compensation  unit  with  the  spatial  compression  unit  is  pos¬ 
sible.  More  specifically,  a  conventional  standard-compliant  video  coder  is  usually  implemented  as  a 
hybrid  DCT-based  structure  in  Fig.  1(a),  which  achieves  spatial  compression  through  Discrete  Cosine 
Transform  (DCT)  and  temporal  compression  through  motion  compensation  traditionally  accomplished 
in  spatial  domain.  In  this  hybrid  structure,  the  feedback  loop  contains  three  major  components:  DCT, 
IDCT  (Inverse  DCT)  and  SD-ME  (spatial  domain  motion  estimation).  All  incoming  raw  video  data 
must  traverse  this  heavily  loaded  feedback  loop  once  in  order  to  be  encoded  in  the  output  bitstream. 
In  addition  to  the  disadvantage  of  having  more  hardware  components,  the  throughput  of  the  whole 
coder  is  also  limited  by  the  complexity  of  the  loop.  However,  if  motion  can  be  estimated  and  com¬ 
pensated  entirely  in  the  transform  domain,  then  DCT  can  be  moved  out  of  the  loop  and  IDCT  be 
eliminated,  resulting  in  a  fully  DCT-based  video  coder  as  shown  in  Fig.  1  (b)  where  the  feedback  loop 
has  only  one  major  component,  transform  domain  motion  estimation  (TD-ME)  [23]  instead  of  three 
major  components. 

Based  upon  the  concept  of  pseudo-phases  in  DCT  coefficients  and  the  sinusoidal  orthogonal  princi¬ 
ples,  a  DCT-based  integer-pel  motion  estimation  scheme  (DXT-ME)  of  very  low  computational  com¬ 
plexity  ( 0(N 2)  as  opposed  to  0(NA)  for  the  widely  used  Full  Search  Block  Matching  Algorithm)  was 
proposed  in  [18],  [19]  to  realize  the  fully  DCT-based  video  coder  design,  as  depicted  in  Fig.  2  and  summa¬ 
rized  in  Table  I.  In  this  paper,  we  further  explore  this  DCT-based  concept  at  the  subpixel  level  and  show 
that  if  the  spatial  sampling  of  images  satisfies  the  Nyquist  criterion,  the  subpixel  motion  information  is 


2 


1.  Compute  the  2-D  DCT  coefficients  of  second  kind  (2D-DCT-II)  of  a  N  x  N  block  of  pixels  at 
the  current  frame  t,  {xt(m,n)\m,n  E  {0 —  1}}. 

2.  Convert  the  stored  2D-DCT-II  coefficients  of  the  corresponding  N  x  N  block  of  pixels  at  the 
previous  frame  t  —  1,  {xt-i(m,n)\m,n  E  {0, N  —  1}}  to  2D  DCT  coefficients  of  first  kind 
(2D-DCT-I)  through  a  simple  rotation  unit  T. 

3.  Find  the  pseudo  phases  {gcs (k,  l)\  k  =  0, 1, . . .  ,N  —  1;  l  —  1, 2, . . . ,  N}  and  {g3C(k, /);  k  = 
1,2,...,  iV;  l  —  0, 1, . . . ,  N  —  1},  which  are  calculated  from  the  DCT  coefficients  independently 
at  each  spectral  location  (kj). 

4.  Determine  the  normalized  pseudo  phases  f(k,l )  and  g(k,l)  from  gcs(k,l)  and  gsc(k,l)  re¬ 
spectively  by  setting  ill-formed  gcs(k,l)  and  gsc(k,l)  to  zero: 


/(M)  = 


g(k,i)  = 


C(k)C(l)gcs(k,l),  for  |/^(M)|  <  1, 
0,  otherwise, 

C(k)C(l)gsc(k,l),  for  \gsc(k,l)\  <  1, 
0,  otherwise, 


where 


C(  „)  =  (75'  forn  =  0oriV' 

[  1,  otherwise, 

5.  Obtain  the  inverse  DCT  (2D-IDCT-II)  of  f(k,  l)  and  g{k,l)  as  DCS(m,n)  and  DSC{m,n) 
for  m,n  €  {0, ...,  iV  —  1}  respectively  which  basically  are  composed  of  impulse  functions  whose 
peak  positions  indicate  the  shift  amount  and  peak  signs  reveal  the  direction  of  the  movement: 

4  N  k-TT  1  1 7T  1 

DCS(m,  n)  =  £  £  C(k)C{l)f{k,  l)  cos  —  (m  +  -)  sin  —  (n  +  -), 

iV  k= o  1=1 

aNN-1  I  1  hr  1 

DSC(m,  n)  =  C(k)C{l)g{k,l)  sin  —  (m  +  -)  cos  —  (n  +  -). 

fc—  1  (=0 


6.  Search  for  the  peaks  of  DCS(m ,  n)  and  DSC(m,  n)  over  (m,  n)  E  {0, . . . ,  N  —  l}2  (or  range 
of  interest)  such  that 

(* DCS,  j DCs)  —  arg  max  \DCS{m,n)\, 
m,n€‘ I* 

(■ iDSC,3DSC )  =  arg  max  \DSC{m,n)\. 
mtn(z<P 

7.  Estimate  the  displacement  d  —  ( mu,rhv )  from  the  signs  and  positions  of  the  peaks  of 
DC S(m,n)  and  DSC(m,n): 


iDSC  =  idcs ,  if  DSC(iDsc,  Jdsc)  >  0, 

~~{}DSC  +  1)  =  -{iDCS  +  1),  if  DSC{iDsc,jDSC )  <  0, 

jDCS  =  JDSC,  if  DCS(ioCS,jDCs)  >  0, 

~{jDCS  +  1)  =  ~(j DSC  +  1),  if  DCS(iDCS,3DCS )  <  0, 


TABLE  I 

Summary  of  DCT-based  integer-pel  motion  estimation  scheme  (DXT-ME) 
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Fig.  2.  Block  diagram  of  the  DCT-based  integer-pel  motion  estimator  (DXT-ME) 


preserved  in  the  pseudo  phases  of  DCT  coefficients  of  moving  images.  Furthermore  it  can  be  shown  that 
with  appropriate  modification,  the  sinusoidal  orthogonal  principles  can  still  be  applicable  except  that 
an  impulse  function  is  replaced  by  a  sine  function  whose  peak  position  reveals  subpixel  displacement. 
Therefore,  exact  subpixel  motion  displacement  can  be  obtained  without  the  use  of  interpolation.  From 
these  observations,  we  can  develop  a  set  of  subpixel  DCT-based  motion  estimation  algorithms,  that  are 
fully  compatible  with  the  integer-pel  motion  estimator,  for  low-complexity  and  high-throughput  video 
applications. 

In  this  paper,  we  discuss  the  pseudo  phases  carrying  subpixel  motion  information  in  Section  II  and 
the  subpel  sinusoidal  orthogonal  principles  in  Section  III  for  objects  moving  out  of  synchronization  with 
the  sampling  grid.  In  Section  IV,  we  propose  the  DCT-based  half-pel  (HDXT-ME)  and  quarter-pel 
(QDXT-ME  and  Q4DXT-ME)  motion  estimation  algorithms  whose  simulation  results  on  actual  video 
sequences  of  different  characteristics  are  presented  in  Section  V  in  comparison  with  the  popular  block 
matching  approaches.  Finally,  we  conclude  the  major  contributions  of  this  paper  in  Section  VI. 

II.  Pseudo  Phases  at  Subpixel  Level 

A.  One  Dimensional  Signal  Model 

Without  loss  of  generality,  let  us  consider  the  one  dimensional  model  in  which  a  continuous  signal 
xc{t)  and  its  shifted  version  xc(t  -  d)  are  sampled  at  a  sampling  frequency  1/T  to  generate  two  sample 
sequences  {xi(n)  =  xc(riT)}  and  {a;2(n)  =  xc(nT  -  d)},  respectively.  Let  us  define  the  DCT  and  DST 
coefficients  as 

Xf(k)  =  DCTja;*}  =  Y  Xi{n)  cos  ^(n  +  i),  (1) 

n=0 

xf  (ft)  =  DSTfij}  =  Y  x'*(n)  siaJf(n  +  (2) 

71=0 
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where 


C(k)  = 


^=,  for  k  =  0  or  N, 
1,  otherwise, 


for  i  —  1  or  2.  By  using  the  sinusoidal  relationship: 

cos  ^r(n  +\)  =  ^[e^(n+5)  +  e-J#(n+i>];  sin  ^(n  +  ^)  =  ^[eJ'^(n+a)  -  (3) 

we  can  show  that  the  DCT/DST  and  DFT  coefficients  are  related  as  follows: 

Xf{k)  =  ®[Xf  (-AV  w  +  If  (fc)e->'&],  for  fc  =  0, . . . ,  N  -  1, 


N 

Xf(k)  =  ^[Xf  (4)^  -  Xf(A0e~^],  for  k  —  1,. . .  ,N, 

where  {Xf  (A;)}  is  the  DFT  of  the  zero-padded  sequence  {j;f  (n) }  defined  as 

{Xi(n),  for  n  =  0, . . . ,  N  —  1, 

0,  for  n  =  N, . . . ,  2N  —  1, 

so  that 

N-l 

Xf(k)  =  DFT{xf }  -  for  k  —  0, . . .  ,2N  -  1. 

n= 0 


(4) 

(5) 


(6) 


(7) 


From  the  sampling  theorem,  we  know  that  the  Discrete  Time  Fourier  Transform  (DTFT)  of  sequences 
xi (n)  and  x'2 (n)  are  related  to  the  Fourier  Transform  (FT)  of  xc(t),  Xc(ft),  in  the  following  way: 


XiM  =  DTFT{x!}  =  Xc(^^), 

^  i 


X2(w)  =  DTFT{x2}  =  ^£XC( 


1 


T 

,U )  —  27t/ 


)  e 


(8) 

(9) 


Furthermore,  if  Xc(fi)  is  bandlimited  in  the  baseband  (— y,  f ),  then  for  Q  =  fj?  €  (— |t,  J), 


Xr(fiT)  =  — Xc(fi), 
X2(ftT)  =  ^Xc(0)  e-jfW. 


Thus,  the  DFT  of  xq (n)  and  x2(n)  are 


N-l 

Xi(Jfc)  =  DFT{a,’x}  =  J2  xi{n)e~j 

n= 0 

N-l 
A 


2„kn  .27 rA;.  1  .2nk. 

N  =  Xi{— )  =  ^Xc(— ) 


X2(fc)  =  DFT{x2}  =  5]  x2(n)e 

n=0 


_  •  ‘2fffcn 

J  JV 


X2( 


FA 

27tA; 


1  ,,  ,2irk.  2t rkd 

,  —  —XA - )  e  J  nt  , 

N  '  T  y  NT 


(10) 

(11) 


(12) 

(13) 
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whereas  the  DFT  of  xf  (n)  and  xf  (n)  become 


(14) 

(15) 


*?(*)  =  = 


,nk 


1 


7T& 


^W=X2(-)  =  -Xc(— )c 


IV 


JVT 


■  ?rA:d 
“7  7VT 


Therefore, 


.  71  k  \  .  7T  A>‘  _  a  tl hd. 

x2(w)  =  Xl(w)e  ’-T. 


(16) 


Substituting  (16)  back  into  (4)-(5),  we  get 


X2c(fc)  = 
X|(^')  = 


g(fc) 

N 

C(k) 


jN 


[Xi(—k)e'lNTe^N  +  Xf  (£;)e  J«e  ], 
[Xf  (4)^^  -  Xf  (fc)e-^r^], 


for  k  =  0, . . . ,  IV  -  1, 
for  k  =  1, ...  ,1V. 


(17) 

(18) 


Using  the  sinusoidal  relationship  in  (3)  to  change  natural  exponents  back  to  cosine/sine,  we  finally 
obtain  the  relationship  between  x\ (n)  and  x2(n)  in  DCT/DST  domain: 

Xr ?(k)  =  5Z  xi(n)cos  ~(n  +  ^  +  i),  for  k  =  0, . . . ,  IV  -  1,  (19) 

n— 0 

X${k)  =  2C^~  x'i(n)sin^(n+  ^  +  |),  for  k  =  1, . . . ,  N.  (20) 

n=0 

We  conclude  the  result  in  the  following  theorem: 

THEOREM  1:  If  a  continuous  signal  xc(t)  is  ^-bandlimited  and  the  sampled  sequences  of  xc(t) 
and  xc(t  —  d)  are  { xc(nT )}  and  {xc(nT  —  d)},  respectively,  then  their  DCT  and  DST  are  related  by 


DCT {xc(nT  -  d)}  =  DCTd  {xc(nT)},  (21) 

DST {xc{nT  -  d)}  =  DST ±{xc{nT)},  (22) 

where 

DCTa{x}  =  2C^  x (n)  cos  a  +  §)’  (23) 

n= 0 

A  2C(k)  k-K  1. 

DST^x}  =  — x(n)sin  —  (n  +  f3  +  -),  (24) 

n=0 

are  the  DCT  and  DST  with  a  and  (3  shifts  in  their  kernels,  respectively.  Here  d  is  the  shift  amount  and 
T  is  the  sampling  interval,  but  d/T  is  not  necessarily  an  integer. 
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(a)  (b) 


Fig.  3.  (a)  The  black  dots  and  the  gray  squares  symbolize  the  sampling  grids  for  frames  It- 1  (u,  v)  and  It(u,  v)  at 
a  sampling  distance  d  respectively.  These  two  frames  are  aligned  on  the  common  object  displaced  by  (du,dv) 
in  the  continuous  coordinate  ( u,v ).  (b)  Two  digitized  images  of  consecutive  frames,  xt~i (m,n)  and  xt(m,n ), 
are  aligned  on  the  common  object  moving  (A^At,)  =  (du/d,dv/d)  pixels  southeast. 

B.  Two  Dimensional  Image  Model 

Consider  a  moving  object  casting  a  continuous  intensity  profile  /t(u,  v)  on  a  camera  plane  of  the 
continuous  coordinate  ( u,v )  where  the  subscript  t  denotes  the  frame  number.  This  intensity  profile 
is  then  digitized  on  the  fixed  sampling  grid  of  the  camera  with  a  sampling  distance  d  to  generate  the 
current  frame  of  pixels  xt(m,  n )  shown  in  Fig.  3(a)  where  m  and  n  are  integers.  Further  assume  that  the 
displacement  of  the  object  between  the  frames  t— 1  and  t  is  (du,  dv)  such  that  It(u,  v )  =  It-i(u-du,  v—dv) 
where  du  —  (mu  +  vu)d  =  A ud  and  dv  =  (rnv  +  uv)d  —  A vd.  Here  mu  and  rnv  are  the  integer  components 
of  the  displacement,  and  uu  and  uv  6  (  — ^].  Therefore, 

xt(m,n)  —  It(md,nd)  =  It-\{md  —  du,nd  —  dv), 
xt~i(m,n)  =  It-i(md,nd), 

as  in  Fig.  3(b).  Unlike  the  case  of  integer-pel  movement,  the  displacement  is  not  necessarily  multiples 
of  the  sampling  distance  d.  In  other  words,  vu  and  vv  do  not  necessarily  equal  zero. 

For  integer-pel  displacements,  i.  e.  Xu  =  mu  and  A,,  =  mv,  the  pseudo  phases  are  computed  by 
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solving  the  pseudo-phase  motion  equation  at  ( k,l )  [18],  [19]: 


Zj— i  (k,  l )  ■  9mutrnv  (A;,  /)  —  X-t(k,  /),  for  k ,  l  £  A/" 


(25) 


where  A/"  =  {1, . . . ,  IV  —  1},  is  the  pseudo-phase  vector,  and  the  4x4  system  matrix  Zt-i  and  the 

vector  xf  are  composed  from  the  2D-DCT-II  of  xt~\ (m,n)  and  the  2D-DCT-I  of  xt(m,n)  respectively: 


Z  t-i(k,l)  = 


Z£i(M) 

A“i(M) 

^i(M) 

££i(M) 


-Z£X(M) 

+zt“i(fc,o 

-%(M) 

+z?h(k,i) 


-za.iKi) 
+Z£  i(M) 
+z^(k,i) 


+Z^(k,l) 

—Zt-i(k,l) 

-Zfhfal) 

+Z£l(k,l) 


X?(k,l) 

9mumv  (k,l) 

Xt(M)  = 

x?s(k,l) 

j  9mu,mv{k,l)  — 

9mSumv  (k, l) 

X?c(k,l) 

9mumv{k,l) 

_  X?°(k,l)  _ 

9mumv  (k)  l ) 

Here  the  2D-DCT-I  of  xt-\(m,n)  are  defined  as: 

a  4  N~l  kn  In 

Z“i(M)  =  DCCTI{a:i-i}  =  ^2 C(k)C{l )  Y  xt-i(m,n)  cos[— (m)]  cos[— (w)], 

rn,n=0 

k,le{o,...,N}, 

A  4  ^  ^  fen  In 

zctix(k,l)  =  DCSTI{x't_i}  =  jpC(k)C{l)  Y  *t-i(™,«)cos[— (m)]sin[— (n)], 

m,n= 0 

fce{o,...,JV},ze{i,...,JVr-i}1 
Zt-i(k,l)  =  DSCTI{x-t_i}  =  ^ C(k)C(l )  *t_i(m,n)sin[^(m)]cos[-^(n)]f 

m,n= 0 

k  £  {1, . . .  ,  IV  —  l},f  €  {0, . . .  ,  IV}, 

a  4  4  fen  In 

ZfixikJ)  =  DSSTI{xt_i}  =  j^C{k)C{l)  Y  *i-i(m,n)sin[— (m)]sin[— (n)], 

TO,JI=0 

k,l  £  {1, . . . ,  AT  —  1}, 


(26) 

(27) 

(28) 

(29) 


and  the  2D-DCT-II  of  xt(m,n)  as 


X£c(k,l)  =  j^C(k)C(l)  Y  a:t(m,n)cos[^(m  +  0.5)]cos[^(n  +  0.5)],  (30) 

m,7i=0 

M  £{0,...,1V-1}, 

a  ^  A’7r  In 

Xts(k,l)  =  jpC(k)C(l)  Y  ®t(n».w)cos[— (m  +  0.5)]sin[— (n  +  0.5)],  (31) 

m,n= 0 
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(32) 


A  /L  ^  /-■  /TT-  J  rjr 

xtC(kJ)  =  j^C(k)C{l)  Y  xt{m,n)sm[— (m  +  0.5)] cos[—  (n  +  0.5)], 

m,n=0 

k  e  6  (0, . . .  ,iV  —  1}, 

A  4  ^*77-  lnr 

xtS(kJ)  =  jpC(k)C(l)  Y  xt(m) n)  sin[— (m  +  0.5)]  sin[— (n  +  0.5)],  (33) 

m,7i=0 


where  { Zf'fx ;  xx  =  cc,  cs ,  sc,  ss}  can  be  obtained  from  {X“;, ;  xx  =  cc,  cs,  sc,  ss}  by  a  simple  rotation: 


«  1 
INI 

Zt-i(k,  l) 

zr-i(kj) 

.  ^i(M) 

+  cos  Jf  cos  |f  +  cos  |f  sin  |f 


-  cos  |f  sin  |f 

-  sin  |f  cos  |f 
+  sin|^sin|^ 


k7T 
2N  ' 


In 

2N 


-sin|^cos^ 


+  sin  5W  cos  5F 


+  cos|f  cos|f 


kn 
‘2N  ' 


In 

2N 


+  cos  |f  COS  I51 


2N 

' cos  w sin 


sin  |f  sin  |f 
sin  |f  cos  |f 
cos  |f  sin  |f 
cos  |f  cos  |f 


xt-i(k,l) 

Xt-i(kJ) 

xt~i(k,  l) 

X^(k,l) 

•  (34) 


for  k,l  E  J\f  and  {Xf^xx  =  cc,  cs,  sc,  ss}  are  computed  and  stored  in  memory  in  the  previous  encoding 


cycle. 

However,  for  non-integer  pel  movement,  we  need  to  use  (21)-(22)  in  Theorem  1  to  derive  the  system 
equation  at  the  subpixel  level.  If  the  Fourier  transform  of  the  continuous  intensity  profile  It(u,v)  is 
J  -  band  limited  and  It(u,v)  ~  It-i(u  —  du ,  v  —  dv),  then  according  to  Theorem  1,  we  can  obtain  the 
following  2-D  relations: 


4  N.~1  kn  1  hr  1 

^f(M)  =  JpC{k)C{l)  Y  *t_i(m>n)cos[— (m  +  Au  + -)] cos[— (n  +  A„  + -)],  (35) 

TO, 71=0 

k,l  £  {0, . . . ,  N  —  1}, 

4  ^y  y  h'Tr  i  i tc  i 

Xr{k,l)  =  -2C(k)C(l)  Y  xt-i(m,n)cos[—(m  + \u  +  -)] sin[— (n  +  +  -)],  (36) 

TO, 71=0 

k  €  {0, . . .  ,N  —  l},l  6  {1, . . . ,  N}, 

4  N_1  kn  \  hr  1 

xtC(k,l)  =  Jj2C(k)c(l)  Y  xt-i(™,n)sm[—{m  +  \u  +  -)]cos[—{n  +  \v  +  -)},  (37) 

771,71=0 

k  £  {lj  •  •  •  j  X],  l  G  {0, . . . ,  N  —  1}, 

4  kir  1  hr  1 

xts(k,l)  =  j^C{k)C{l)  Y  *t-i(m,n)sin[—  (m  +  Xu  +  -)]sin[— (n  +  +  -)],  (38) 

771,71=0 

k,  l  E  ,N}. 


Thus,  we  can  obtain  the  pseudo-phase  motion  equation  at  the  subpixel  level: 

Zj-i (M)  '  8\u,\v(k,l)  =  *t(k,l),  for  k,l  G  A f, 


(39) 
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where  6\Ut \v(k,l)  -  [9\^\v(k,l),gx^\v{k,l),gx^\v(k,l),gx^\v(k,l)}T .  A  similar  relationship  between 


the  DCT  coefficients  of  xt(m,n)  and  xt-i(m,n)  at  the  block  boundary  can  be  obtained  in  the  same 


way: 


^i(M) 

-ZfidkJ) 

■s 

J? 

X?(ktl) 

.  ^i(M) 

zr-i(kj) 

.  9CxuS,xv(k,l) 

X?(k,l) 

Zt-i(k,l) 

-Z?l,(kJ) 

£3 

<£ 

Xfc(k,l) 

Z^(k,l) 

ZflAkJ) 

Xt(k,l) 

Zfli(k,l) 

~Zt-i(k,  l) 

<< 
p  ; 

Xstc{k,l ) 

Z^(k,l) 

zt-i(k,l) 

Xfs{k,l) 

- j 

TR 

j? 

-z^(kJ) 

’-■O 

•< 

04 

X?(k,l) 

_  Z^(k,l) 

Zt-i(k,l) 

.  &v(k,l)  _ 

xf(M) 

k  =  0,  l  e  AT, 

l  =  o,  k  e  M, 

k  =  N,  l  e  J\f, 

l  =  N,  k  e  AT, 

k  =  0,l  =  N, 
k  =  N,  l  =  0. 


(40) 


(41) 

(42) 

(43) 

(44) 

(45) 


In  (39),  the  pseudo  phase  vector  0xu,xv{k,l)  contains  the  information  of  the  subpixel  movement 
(Au,  A„).  In  an  ideal  situation  where  one  rigid  object  is  moving  translationally  within  the  block  boundary 
without  observable  background  and  noise,  we  can  find  0xUtxv(k,l)  explicitly  in  terms  of  \u  and  A„  as 
such: 


0\u,\ v(kJ) 


ol 

CD 

1 _ 

9ZSxv(k,l) 

&v(k,l) 

= 

_  9Xu,xv (k,  l)  _ 

cos  +  5)  cos  +  5) 

cos^(Au  +  l)sin^(Al,  +  ±) 
sin t^-(Au  +  l)cos  lj$(\v  +  1) 
sin^(Au  +  |)  sin  ^  (Aw  +  1) 


(46) 


III.  Subpel  Sinusoidal  Orthogonality  Principles 

In  [18],  [19],  estimation  of  integer-pel  displacements  in  DCT  domain  utilizes  the  sinusoidal  orthogonal 
principles: 


IDCTfCO)  cos[^(n  +  ^)]}  =  ^2  C*(k)  cos[~-(m  +  ^)]  cos[^(n  +  |)]  =  S(m  -  n)  +  8(m  +  n  +  1),  (47) 

fc= o 

IDST{C(fc)  sin [~(n  +  ^)]}  =  ]T)  C'\k)  sin[^(m  +  ^)]  sin [~{n  +  i)]  =  S(m  -  n)  -  S(m  +  n  +  1),  (48) 
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where  S(n)  is  the  discrete  impulse  function,  and  m,  n  are  integers.  This  is  no  longer  valid  at  the  subpixel 
level. 


In  (47)-(48),  we  replace  the  integer  variables  m  and  n  by  the  real  variables  u  and  v  and  define 

N- 1 


k= o 
N- 1 

E' 

k= 0 


kn  1  kn  .  1 

+  („  +  -), 

(49) 

kn  1  .  kn  1 

W(“+2)si"¥(”  +  2)' 

(50) 

We  show  in  the  Appendix  that 


Lc(u,v)  =  +  ^[f(u-w)  +  £(u  +  v  +  l)], 

Ls(u,v)  =  ^sin[7r(n+  ^)]sin[7r(n  +  ^)]  +  ^[£(u  -  v)  -  £{u  +  v  +  1)], 


(51) 

(52) 


where 


N-i 


,kn 


1, 


£(x)  =  Y  c°s(— x)  =  -[1  -  cos7rx  +  si 


cos 


sin  nx 


2N  ■ 


sm ; 


(53) 


k= 0  "  "  2Af 

If  is  so  small  that  the  second  and  higher  order  terms  of  can  be  ignored,  then  cos 


1  ciri  SSL  ~  I2L  Tbn<5 
I,  Sill  2jy  ~  2 N' 


£(x)  «  -[1  —  cos7rx]  +  Ns'mc(x), 


7TX 

2N 


(54) 


where  sinc(x)  =  sin(yrx)/(7r:r).  For  large  N,  £(x)  is  approximately  a  sine  function  whose  largest  peak 
can  be  identified  easily  at  x  =  0  as  depicted  in  Fig.  4,  where  £(x)  closely  resembles  N  ■  sinc(x),  especially 
when  x  is  small. 

A  closer  look  at  (51)-(52)  reveals  that  either  Lc(u,v)  or  Ls(u,v)  consists  of  £  functions  and  one 
extra  term  which  is  not  desirable.  In  order  to  obtain  a  pure  form  of  sine  functions  similar  to  (47)- (48), 
we  define  two  modified  functions  Lc(u,v)  and  Lc(u,v)  as  follows: 


r  ,  .A  kn  .  1 .  kn  1 . 

Lc(u,v)=  Y  cos— (u  +  -)  cos  —  (u  +  -), 


Ls(u,v)  =  Y  sinir(u+F)sinlT(u  +  F)- 


k= 0 
N-l 


N 

kn 


N 

kn 


k= l 


N 


N 


(55) 

(56) 


Then  we  can  show  that 


Lc{u,v )  =  -[£(u-u)+£(u  +  u  +  l)], 
Ls(u,v)  =  ^[£(u-«)  -£(u  +  «  +  !)]. 


(57) 

(58) 
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Fig.  4.  Plot  of  £,(x)  =  J2k= o  cos(ttx)  f°r  N  =  16.  Observe  the  similarity  between  the  curves  of  N*sinc(x)  and 
the  last  term  of  £. 

Equations  (55)- (58)  are  the  equivalent  form  of  the  sinusoidal  orthogonal  principles  (47)- (48)  at  the 
subpixel  level.  The  sine  functions  at  the  right  hand  side  of  the  equations  are  the  direct  result  of  the 
rectangular  window  inherent  in  the  DCT  transform  [32].  Fig.  5  (a)  and  (b)  illustrate  Ls(x,—  3.75)  and 
Lc(x,  —3.75)  respectively  where  two  f  functions  are  interacting  with  each  other  but  their  peak  positions 
clearly  indicate  the  displacement.  However,  when  the  displacement  v  is  small  (in  the  neighborhood  of 
-0.5),  £(u  —  v)  and  ((u+v  +  l)  move  close  together  and  addition/subtraction  of((u  —  v)  and£(u  +  u  +  l) 
changes  the  shape  of  Ls  and  Lc.  As  a  result,  neither  Ls  nor  Lc  looks  like  two  £  functions  and  the  peak 
positions  of  Ls  and  Lc  are  different  from  those  of  £(u  —  v)  and  £(u  +  v  +  1),  as  demonstrated  in  Fig.  5 
(c)  and  (d)  respectively  where  the  peak  positions  of  Ls(x,  —0.75)  and  Lc( x,  —0.75)  are  —1.25  and  —0.5, 
differing  from  the  true  displacement  —0.75.  In  the  extreme  case,  £(u  —  v)  and  £(u  +  v  +  1)  cancel  out 
each  other  when  the  displacement  is  -0.5  such  that  Ls(x,  —0.5)  =  0  as  shown  in  Fig.  5(e). 

Fortunately  we  can  eliminate  the  adverse  interaction  of  the  two  £  functions  by  simply  adding 
Lc  and  Ls  together  since  Lc(x,v)  +  Ls( x,v)  =  £(x  —  v)  as  depicted  in  Fig.  5(f)  where  the  sum 
Lc(x,—  0.75)  +  Ls(x,  —  0.75)  behaves  like  a  sine  function  and  its  peak  position  coincides  with  the  dis¬ 
placement.  Furthermore,  due  to  the  sharpness  of  this  if  function,  we  can  accurately  pinpoint  the  peak 
position  under  a  noisy  situation  and  in  turn  determine  the  motion  estimate.  This  property  enables  us 
to  devise  flexible  and  scalable  subpixel  motion  estimation  algorithms  in  the  subsequent  sections. 
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IDST  lor  displacement  =  -3.750000  IDCT  lor  displacement  =  -3.750000  IDST  lor  displacement  =  -0.750000 


(d)  Lc{x,  -0.75)  (e)  Ls(x,  -0.5)  (f)  Lc(x,  -0.75)  +  Ls(x,  -0.75) 

Fig.  5.  Illustration  of  sinusoidal  orthogonal  principles  at  the  subpixel  level  for  different  displacements. 


IV.  DCT-Based  Fractional-Pel  Motion  Estimation 

In  this  section,  we  apply  the  subpixel  sinusoidal  orthogonal  principles  to  develop  an  exact  sub¬ 
pixel  motion  displacement  scheme  without  the  use  of  interpolation  to  estimate  half-pel  and  quarter-pel 
movements  for  high  quality  video  applications. 

A.  DCT-Based  Half-Pel  Motion  Estimation  (HDXT-ME) 

From  (39)  in  Section  II,  we  know  that  the  subpixel  motion  information  is  hidden,  though  not  obvious, 
in  the  pseudo  phases.  To  obtain  subpixel  motion  estimates,  we  can  directly  compute  the  pseudo  phases 
in  (39)  and  then  locate  the  peaks  of  the  sine  functions  after  applying  the  subpixel  sinusoidal  orthogonal 
principles  (55)- (58)  to  the  pseudo  phases.  Alternatively,  we  can  have  better  flexibility  and  scalability 
by  first  using  the  DXT-ME  algorithm  to  get  an  integer-pel  motion  estimate  and  then  utilizing  the 
pseudo  phase  functions  f(k,  l)  and  g(k,  l)  computed  in  the  DXT-ME  algorithm  as  in  Table  I  to  increase 
estimation  accuracy  to  half-pel,  due  to  the  fact  that  (39)  has  exactly  the  same  form  as  (25).  Specifically, 
based  upon  the  subpixel  sinusoidal  orthogonal  principles  (55)- (58),  the  subpixel  motion  information  can 
be  extracted  in  the  form  of  impulse  functions  with  peak  positions  closely  related  to  the  displacement. 
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TABLE  II 

Determination  of  direction  of  movement  (Au,A„)  from  the  signs  of  DSC  and  DCS 


For  the  sake  of  flexibility  and  modularity  in  design  and  further  reduction  in  complexity,  we  adopt 
the  second  approach  to  devise  a  motion  estimation  scheme  with  arbitrary  fractional  pel  accuracy  by 
applying  the  subpixel  sinusoidal  orthogonal  principles  to  the  pseudo  phase  functions  passed  from  the 
DXT-ME  algorithm.  The  limitation  of  estimation  accuracy  will  only  be  determined  by  the  interaction 
effects  of  the  £  functions  as  explained  in  Section  III  and  the  slope  of  the  £  function  at  and  around  zero 
and  how  well  the  subpixel  motion  information  is  preserved  in  the  pseudo  phases  after  sampling. 

We  define  DCS(u,v)  and  DSC(u,v)  as  follows: 


VV-1  N-l 


DCS(u,v)  =  ED 
k= 0  1=1 
N-l  N-l 


DSC(u,v )  =  EEt 


f(k,l)  ,  kn ,  1.  .  hr,  1. 

C(km)]mS  N(u+2)SmN{v+2)' 

(59) 

g(k,l)  ,  .  kit,  1.  In.  1. 

C(i)C(i),Sm«(“+2)COSJV(,'+2)' 

(60) 

Thus,  from  the  subpixel  sinusoidal  orthogonal  principles  (55)- (58)  and  the  definitions  of  f(k,l )  and 


g(k,l )  in  Table  I,  we  can  show  that 


DCS(u ,  v )  =  -[£(u  —  Au)  +  £(u  4-  A„  +  1)]  •  [£(w  —  A„)  —  £(«  +  A„  +  1)],  (61) 

DSC(u, v)  =  —  [£(u  —  Xu)  —  £(u  +  \u  +  1)]  •  [£(u  —  A„)  +  £(u  +  A„  +  1)].  (62) 


The  rules  to  determine  subpixel  motion  direction  are  summarized  in  Table  II  and  similar  to  the  rules 
in  determination  of  integer-pel  motion  direction  in  [19]. 

Fig.  6  illustrates  how  to  estimate  subpixel  displacements  in  the  DCT  domain.  Fig.  6  (c)  and  (d) 
depict  the  input  images  x\ (m,n)  of  size  16  x  16  (i.e.  N  —  16)  and  a :2(m,n)  displaced  from  x\ (m,n) 
by  (2.5, —2.5)  respectively  at  SNR  =  50  dB.  These  two  images  are  sampled  on  a  rectangular  grid 
at  a  sampling  distance  d  —  0.625  from  the  continuous  intensity  profile  xc(u,v)  =  exp(— (u2  +  v 2))  for 
u,  v  G  [—5, 5]  in  Fig.  6  (a)  whose  Fourier  transform  is  bandlimited  as  in  Fig.  6  (b)  to  satisfy  the  condition 
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(a)  continuous  intensity  profile  xc(u,v) 


x2  (*  Xl  displaced  by  (2  5.-2  5|) 


(b)  FT  of  xc(u,v) 


N=  16.  d- (2  5.-2  S) 


(c)  16  x  16  xi  (m,  n) 


(d)  16  x  16  X‘2 (m,n) 


(e)  pseudo  phase  f{k,l) 


SNR  =  50  dB  00 


(f)  pseudo  phase  g(kj) 

Half  Pel  DSC  (peak  al  (2.5, 1,5)1 


in  Theorem  1.  Fig.  6  (e)  and  (f)  are  the  3-D  plots  of  the  pseudo  phases  f(k,l )  and  g(k,l)  provided  by 
the  DXT-ME  algorithm  which  also  computes  DCS(m,n)  and  DSC(m,n)  as  shown  in  Fig.  6  (g)  and 
(h)  with  peaks  positioned  at  (3, 1)  and  (2, 2)  corresponding  to  the  integer-pel  estimated  displacement 
vectors  (3,  —2)  and  (2,  —3)  respectively  because  only  the  first  quadrant  is  viewed.  As  a  matter  of  fact, 
DCS(m,n)  and  DSC(m,n)  have  large  magnitudes  at  {(m,n);m  =  2,3,  n  =  1,2}. 

To  obtain  an  estimate  at  half-pel  accuracy,  we  calculate  DCS(u,v )  and  DSC(u,v)  in  (59)  and  (60) 
respectively  for  u,v  =  0  :  0.5  :  N  —  1  as  depicted  in  Fig.  6  (i)  and  (j)  where  the  peaks  can  clearly 
be  identified  at  (2.5, 1.5)  corresponding  to  the  motion  estimate  (2.5,  —2.5)  exactly  equal  to  the  true 
displacement  vector  even  though  the  two  input  images  do  not  look  alike.  Note  that  the  notation  a  :  r  :  b 
is  an  abbreviation  of  the  range  {a  +  i  ■  r  for  i  =  0, . . . ,  |_^J }  =  {a,  a  +  r,  a  +  2r, . . . ,  b  —  r,  b}.  For 
comparison,  DCS(u ,  v)  and  DSC(u,  v)  are  also  plotted  in  Fig.  6  (k)  and  (1)  respectively  for  u,  v  =  0  : 
0.25  :  N  —  1  =  0, 0.25, 0.5, ...  ,N  —  1.25,  N  —  1  where  smooth  ripples  are  obvious  due  to  the  £  functions 
inherent  in  DCS  and  DSC  of  (61)-(62)  and  have  peaks  also  at  (2.5, 1.5). 

Therefore,  the  DCT-based  half-pel  motion  estimation  algorithm  (HDXT-ME)  comprises  three  steps: 

1.  The  DXT-ME  algorithm  estimates  the  integer  components  of  the  displacement  as  (mu,mv). 

2.  The  pseudo  phase  functions  from  the  DXT-ME  algorithm,  f(k ,  l)  and  g(k,  l),  are  used  to  compute 
DCS(u,  v )  and  DSC(u,  v)  for  u  G  {m„  -  0.5,  rhu,  rhu  +  0.5}  and  v  G  {mv  -  0.5,  rhv,mv  +  0.5}  from 
(59)  and  (60)  respectively. 

3.  Search  the  peak  positions  of  DCS(u,v )  and  DSC(u,v )  for  the  range  of  indices,  $  =  {{u,v)  :  u  G 
{mu  -  0.5 ,mu,mu  +  0.5};  v  G  {rhv  -  0.5 ,mv,mv  +  0.5}},  to  find 


(ulJCS’vT)Cs)  =  ar9  maX \DCS(u,  u)|, 

u,v{  '!> 

(u~dsc'v~dsc')  =  arg  max_\DSC(u,v)\. 


(63) 

(64) 


These  peak  positions  determine  the  estimated  displacement  vector  (A„,A„). 
solute  value  of  DSC(u,v)  is  less  than  a  preset  threshold  ep  >  0,  then  Xu 
\DCS(u,v)\  <  e£>,  A^  =  -0.5.  Therefore 


UDSC  ~ UDCS ’ 

if  \DSC(udsc,vDSc)\  >  eD 

-0.5, 

if  \DSC(udsc,vdsc)  |  <  ep 

VDCS ~ VDSC' 

if  | DCS(udcs,vdcs)\  >  ep 

-0.5, 

>. 

if  | DC S{uDCS,v DCS)\  <  ep 

However,  if  the  ab- 
=  —0.5.  Likewise,  if 


(65) 


(66) 
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In  Step  2,  only  those  half-pel  estimates  around  the  integer-pel  estimate  ( rhu,mv )  are  considered  due 
to  the  fact  that  the  DXT-ME  algorithm  finds  the  nearest  integer-pel  motion  estimate  ( rhu ,  rhv )  from  the 
subpixel  displacement.  This  will  significantly  reduce  the  number  of  computations  without  evaluating 
all  possible  half-pel  displacements. 

In  Step  3,  the  use  of  en  deals  with  the  case  of  zero  pseudo  phases  when  the  displacement  is  —0.5. 
Specifically,  if  Xu  =  —0.5,  then  g\^ \o  (k,  l)  =  0,  Vfc,  l  which  leads  to  g(k,l)  —  0  and  DSC(u,v)  =  0. 
However,  in  a  noisy  situation,  it  is  very  likely  that  g(k,  l )  is  not  exactly  zero  and  thus  neither  is 
DSC(u,v).  Therefore,  e#  should  be  set  very  small  but  large  enough  to  accommodate  the  noisy  case. 
In  our  experiment,  ep  is  empirically  chosen  to  be  0.08.  Similar  consideration  is  made  on  DCS(u,v)  for 
Xv  =  —0.5.  It  is  also  possible  that  the  peak  positions  of  DCS(u,v)  and  DSC(u,v)  differ  in  the  noisy 
circumstances.  In  this  case,  the  arbitration  rule  used  in  the  DXT-ME  algorithm  may  be  applied  [18], 

[19]- 

To  demonstrate  the  accuracy  of  this  HDXT-ME  algorithm,  we  use  a  16  x  16  dot  image  x\  in  Fig.  7 
(a)  as  input  and  displace  x\  to  generate  the  second  input  image  X‘i  according  to  the  true  motion  field 
{(Xu,  X v)  :  Xu,  Xv  =  —  5  :  0.5  :  4}  shown  in  Fig.  7  (b)  through  the  bilinear  interpolating  function  specified 

in  the  MPEG  standard  [27]  which  interpolates  the  value  x(rn  +  u,  n  +  v)  from  four  neighboring  pixel 

values  for  m,n  being  integers  and  u,v  €  [0, 1)  in  the  following  way: 

x(m  +  u,  n  +  v)  =  (1  —  u)  ■  (1  —  v)  ■  x(m,  n)  +  (1  —  u)  ■  v  ■  x(m ,  n  +  1) 

+  u  ■  (1  —  v)  ■  x(m  +  1,  n)  +  u  ■  v  ■  x(m  +  1,  n  +  1).  (67) 

Fig.  7  (c)  shows  the  estimated  motion  field  by  the  HDXT-ME  algorithm  which  is  exactly  the  same  as 
the  true  motion  field. 

Fig.  8  (a)-(c)  further  illustrate  estimation  accuracy  for  half-pel  motion  estimation  schemes  using 
peak  information  from  Ls(u,v),  Lc(u,v),  and  Lc(u,v)  +Ls(u,v)  respectively.  In  Fig.  8  (a),  the  “+”  line 
indicates  peak  positions  of  Ls(u,v )  found  in  the  index  range  {0  :  0.5  :  15}  for  a  block  size  iV  =  16  with 
respect  to  different  true  displacement  values  (-7  :  0.5  :  7}.  The  “o”  line  specifies  the  final  estimates  after 
determination  of  motion  directions  from  the  peak  signs  of  Ls(u,v)  according  to  the  rules  in  Table  II. 
These  estimates  are  shown  to  align  with  the  reference  line  u  =  v,  implying  their  correctness.  For  the 
true  displacement  =  —0.5,  Ls(— 0.5,  v)  =  0  for  all  v  and  ep  is  used  to  decide  whether  the  estimate  should 
be  set  to  -0.5.  In  Fig.  8  (b),  Lc(u,v )  is  used  instead  of  Ls(u,v)  but  Lc(u,v)  is  always  positive,  inferring 
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Half-Pel  DCT-Based  Motion  Estimation  on  a  Dot  Image  (xb.pgm) 


Little  Dot  (xb.pgm) 


(a)  Input  Image 


(b)  True  Motion  Field 
for  Half-pel  Movement 


(c)  Estimated  Motion  Field 
of  HDXT-ME 

v  Quarter-Pel  DCT-Based  Motion  Estimation  on  a  Dot  Image  (xb.pgm) 
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QDXT-ME  by  moving  a  dot  image  (a)  according  to  the 


Fig.  7.  Estimated  motion  fields  (c)(e)  of  HDXT-ME  and 
true  motion  fields  (b)(d). 
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TRUE  DISPLACEMENT  VS  PEAK  POSITION  OF  Ls(u.v) 


true  displacement  (u=-7  0.5:7) 


(d)  Ls  ( u ,  v)  for  quarter-pel  estimation 


TRUE  DISPLACEMENT  VS  PEAK  POSITION  OF  Lc(u.v)  TRUE  DISPLACEMENT  VS  PEAK  POSITION  OF  ts(u.v)+Lc(U.V) 
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true  displacement  (u*-7.0.5.7)  Vue  displacement  (u*-7  0  57) 

(b)  Lc(u,v)  for  half-pel  estimation  (c)  Lc(u,v)  +Ls(u,v)  for  half-pel  estimation 


TRUE  DISPLACEMENT  VS  PEAK  POSITION  OF  Lcfu.v)  TRUE  DISPLACEMENT  VS  PEAK  POSITION  OF  U(u,v)+Lc(u.v) 


true  displacement  <u=-7.0  25.7)  Vue  displacement  (u~ 7  0  25  7) 


(e)  Lc(u,v)  for  quarter-pel  estimation  (f)  Lc{u,v)  +  Ls(u,v)  for  quarter-pel  estimation 


Fig.  8.  Relation  between  true  displacements  and  peak  positions  for  half-pel  and  quarter-pel  estimation.  The 
signs  of  peak  values  in  Ls(u,v)  indicate  the  motion  directions  and  are  used  to  adjust  the  peak  positions  for 
motion  estimates. 


that  no  peak  sign  can  be  exploited  to  determine  motion  direction.  In  Fig.  8  (c),  Lc(u,v)  +  Ls{u,v) 
provides  accurate  estimates  without  adjustment  for  all  true  displacement  values  but  the  index  range 
must  include  negative  indices,  i.  e.  [—15  :  0.5  :  15]. 

In  the  HDXT-ME  algorithm,  Step  2  involves  only  nine  DCS(u,v)  and  DSC(u,v )  values  at  and 
around  ( mu,rhv ).  Since  DCS(u,v)  and  DSC(u,v)  are  variants  of  inverse  2D-DCT-II,  the  parallel  and 
fully-pipelined  2D  DCT  lattice  structure  proposed  in  [7],  [24],  [25]  can  be  used  to  compute  DCS(u,v) 
and  USC(u,v)  at  a  cost  of  O(N)  operations  in  N  steps.  Furthermore,  the  searching  in  Step  3  requires 
0(N 2)  operations  for  one  step.  Thus,  the  computational  complexity  of  the  HDXT-ME  algorithm  is 
0(N2)  in  total. 

B.  DCT-Based  Quarter-Pel  Motion  Estimation  (QDXT-ME  and  Q4DXT-ME) 

In  Section  III,  we  mention  that  the  interaction  of  two  £  functions  in  Lc(u,v)  and  Ls(u,v )  from  (51) 
and  (52)  disassociates  the  peak  locations  with  the  displacement  (A„,  A„)  for  Au,  Xv  G  [-1.5, 0.5].  In  spite 
of  this,  in  the  HDXT-ME  algorithm,  we  can  still  accurately  estimate  half-pel  displacements  by  locating 
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the  peaks  of  Ls( X,v)  for  true  displacements  A  =  —  N  +  1  :  0.5  :  N  —  1  and  indices  v  =  0  :  0.5  :  N  —  1 
if  ed  is  introduced  to  deal  with  the  case  for  A  =  -0.5.  However,  at  the  quarter-pel  level,  it  does  cause 
estimation  errors  around  A  =  -0.5  as  indicated  in  Fig.  8  (d)  where  the  peaks  of  Ls( A,  v)  stay  at  v  =  0  for 
true  displacements  A  varying  over  [-1,0].  As  mentioned  in  Section  III,  the  sum  of  Lc( A,  v )  and  Ls( A,  v) 
is  a  pure  £  function  and  thus  the  adverse  interaction  is  eliminated.  As  a  result,  the  peak  position  of 
this  sum  can  be  used  to  predict  precisely  the  displacement  at  either  half-pel  level  or  quarter-pel  level 
as  demonstrated  in  Fig.  8  (c)  and  (f)  respectively.  However,  for  two  dimensional  images,  DCS  or  DSC 
has  four  £  functions  as  in  (61)  or  (62).  For  the  DXT-ME  algorithm  provides  two  pseudo  phase  functions 
f(k,l )  and  g(k,l),  only  DCS  and  DSC  are  available  for  subpixel  estimation.  In  this  case,  the  sum  of 
DCS  and  DSC  can  only  annihilates  two  £  functions,  leaving  two  £  functions  as  given  by: 

DCS{u,v)  +  DSC(u,v)  =  ^[£(u  —  A„)£(w  —  A„)  —  £(u  +  Au  +  l)£(v  +  A„  +  1)].  (68) 

Even  though  this  sum  is  not  a  single  £  function,  the  estimation  error  of  using  this  sum  is  limited  to  1/4 
pixel  for  the  worst  case  when  true  displacements  are  either  —0.75  or  -0.25. 

The  above  discussion  leads  to  the  DCT-based  quarter-pel  motion  estimation  algorithm  (QDXT-ME) 
as  follows: 


1.  The  DXT-ME  algorithm  computes  the  integer-pel  estimate  (mu,rhv). 

2.  2 7CS(u,v)  and  T7SC{u,v)  are  calculated  from  f{k,l)  and  g{k,l)  in  (59)  and  (60)  respectively  for 
the  range  of  indices,  $  =  {(«,u)  :  u  —  rhu— 0.75  :  0.25  :  mu  +  0.75;  v  =  rhv  —  0.75  :  0.25  :  rh.,,  +  0.75}. 

3.  Search  the  peak  position  of  D<2(u,  v)  —  DCS(u,v)  +  DSC(u,v)  over  $,  i.  e. 


(■ ud2,vd2 )  =  arg  max  |_D2(u,v)|.  (69) 

The  estimated  displacement  vector  is  obtained  as  follows: 


(  •}  )  —  S 


{UD2,VD2), 


if  \D2(uD2,vD2)\  >  eD, 


(70) 


(-0.5, -0.5),  if  \D2(ud2,vd2)\  <  £d- 

Step  3  is  based  on  the  fact  that  |D2(AU,A„)|  =  0  if  and  only  il  (AU,A„)  =  —0.5.  This  QDXT-ME 
algorithm  follows  the  same  procedure  as  HDXT-ME  except  the  search  region  and  using  the  sum  of  DCS 
and  DSC.  Therefore,  QDXT-ME  has  the  same  computational  complexity,  0(N 2),  as  HDXT-ME. 
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If  we  modify  the  DXT-ME  algorithm  to  provide  the  other  two  pseudo  phase  functions  gcc  and  gss 
in  addition  to  /  and  g ,  we  can  compute  DCC  and  DSS  in  the  following  way: 

_  N-lN-l  ,  ,  ,  , 

DCC(u,v )  =  5Z  5CC(M)cos-^(u  +  -)cos-^(u  +  -),  (71) 

fc=0  1=0  iV  z  iV  z 

DSS(u,v)  =  5]/S(fc,/)sin~(u  +  i)sin^(u  +  i).  (72) 

Then  we  can  show  that 

04(u,  u)  =  0CC(u, «)  +  DCS(u,  v)  +  «)  +  DSS{u,  v)  =  £{u  -  Xu)£{v  -  Xv).  (73) 

This  sum1  contains  only  one  £  without  any  negative  interaction  effect  whose  peak  is  sharp  at  (XU,XV). 
This  leads  to  another  quarter-pel  motion  estimation  algorithm  (Q4DXT-ME),  which  can  estimate  ac¬ 
curately  for  all  displacements  at  the  quarter-pel  or  even  finer  level. 

1.  Find  the  integer-pel  estimate  (mu,rhv)  by  the  DXT-ME  algorithm. 

2.  Obtain  four  pseudo  phases  gcc ,  gcs ,  gsc  and  gss  from  the  modified  DXT-ME  algorithm.  Compute 
DCS(u,v),  DSC(u,v),  DCC(u,v),  and  DSS(u,v)  for  the  range  of  indices,  $  =  {(u,  v)  :  u  = 
rhu  —  0.75  :  0.25  :  mu  +  0.75;  v  =  rhv  —  0.75  :  0.25  :  rhv  +  0.75}. 

3.  Search  the  peak  position  of  (u,  v)  over  (uda,vda)  =  ^rgmax.uv£^\D/i(u,v)\.  The  estimated 
displacement  vector  is  then  the  peak  position:  ( XU,XV )  =  (up 4,wzm)- 

Fig.  9  shows  the  procedure  to  estimate  a  quarter-pel  displacement  with  input  images  a;i(m,  n)  and 
x-i (m,n)  sampled  from  the  continuous  intensity  profile  xc(u,v)  and  its  shift  xc(u  —  A ud,v  —  Xvd)  where 
( Xu ,  A„)  =  (2.75,  —2.75)  and  d  =  0.625  as  shown  in  Fig.  9  (a)  and  (b).  Fig.  9  (c)  and  (d)  plot  DSC(m,n) 
and  DCS(m,n)  whose  peaks  are  both  at  (3,2)  corresponding  to  the  integer-pel  estimate  (3,  —3).  Fig.  9 
(e)  and  (f)  are  the  graphs  of  DSC(u,v)  and  DCS(u,v)  at  the  quarter-pel  level  where  the  estimate  is 
found  to  be  (2.75,  —2.75). 

Similar  to  the  half-pel  case,  Fig.  7  (e)  and  (f)  demonstrate  the  accuracy  of  the  estimated  motion 
fields  determined  by  the  QDXT-ME  and  Q4DXT-ME  algorithms  respectively  as  compared  to  the  true 
motion  field  in  Fig.  7  (d).  The  first  input  image  x\ (m,n)  to  both  algorithms  is  a  bandlimited  dot 
image  in  Fig.  7  and  the  second  input  image  a;2(m,n)  is  generated  by  shifting  x\ (m,n)  with  respect 

1  These  four  functions  can  be  generated  naturally  at  the  same  time  using  the  computing  algorithms  and  architectures  in 
[7],  [24], 
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x  1  (of  size  16  x  16) 


*2  (°  xl  displaced  by  [2  75,-2  75|> 


DSC  (peak  at  (3.2)] 


(a)  16  x  16  x\ (m,n)  sampled  from  xc(u,v)  (b)  X2{m,  n)  sampled  from  xc{u  —  du,  v  —  dv)  (c)  DSC(m,n) 

DCS  (peak  at  (3.2)]  Quarter  Pel  DSC  (peak  at  (2.75.1  75)]  Quarter  Pel  DCS  (peak  at  (2  75.1  75)] 


(d)  DCS(m,n)  (e)  DSC(u,v)  for  u,v  =  0  :  0.25  :  15  (f)  DCS(u,v)  for  it,  v  =  0  :  0.25  :  15 

Fig.  9.  Illustration  of  DCT-based  quarter-pel  motion  estimation  algorithm  (QDXT-ME) 


to  the  true  motion  field  in  Fig.  7  (d)  through  the  bilinear  interpolation.  Though  not  obvious  in  the 
graphs,  the  estimates  of  QDXT-ME  around  -0.5  have  an  estimation  error  up  to  a  quarter  pixel  whereas 
Q4DXT-ME  gives  us  perfect  estimation. 

V.  Experimental  Results 

A  set  of  simulations  are  performed  on  two  video  sequences  of  different  characteristics:  Small  Flower 
Garden  (HFG)  containing  fast  moving  scene  and  Miss  America  (MS)  with  slow  head  and  shoulder  move¬ 
ment  accompanying  with  occasional  eye  and  mouth  opening.  The  performance  of  the  DCT-based  algo¬ 
rithms  is  compared  with  Full  Search  Block  Matching  Algorithm  (BKM-ME)  and  its  subpixel  counter¬ 
parts  in  terms  of  mean  square  error  per  pixel  (MSE)  defined  as  MSE  =  n)  —  x(m,n)}2} /N2 

where  x(m,n)  is  the  reconstructed  image  predicted  from  the  original  image  x(m,n )  based  upon  the 
estimated  displacement  vector  A  =  (Xu,  A„).  For  all  the  MSE  values  computed  in  the  experiment,  the 
bilinear  interpolation  in  (67)  is  used  for  fair  comparison  to  reconstruct  images  displaced  by  a  fractional 
pixel.  Furthermore,  for  visual  comparison,  all  residual  images,  generated  by  subtracting  the  original 
images  from  the  reconstructed  frames  predicted  by  various  motion  estimation  schemes,  are  displayed 
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after  the  saturation  level  is  reset  to  35  instead  of  255  to  make  small  pixel  values  of  the  residual  im¬ 
ages  be  visible.  In  addition,  the  needle  maps  for  the  estimated  motion  fields  are  superimposed  on  the 
corresponding  residual  images. 


The  integer-pel  BKM-ME  algorithm  minimizes  the  MSE  function  of  the  block  (a;i(m,n);  m,n  =  0  : 


1  :  IV  —  1}  over  the  search  area  $  =  {(m, n)  :  m,n  =  —  y  :  1  :  AT  —  1  +  y}  such  that 
(A„,  A„)  =  arg 


mm 

N. 

2  2 


2  ’  A  ^  x  T  ~2 

-  X\{m  -u,n-  v)f 


u,v=-N..A.K 


N2 


(74) 


In  the  simulation,  two  levels  of  subpixel  block-matching  motion  estimation  algorithms  are  implemented 
for  comparison: 


1.  Full  Search  Half-Pel  Block  Matching  Algorithm  (FHBKM-ME)  —  Similar  to  BKM-ME,  FHBKM- 
ME  searches  for  the  displacement  of  minimum  MSE  value  among  all  possible  integer-pel  and  half-pel 
displacements  in  the  search  area  $  as  such: 


(AU,A  v)  =  arg  min 


uv-^£L.i.n 

U,V—  2  •  2  •  2 


-  xl(m  -  u’n  -  v)? 
N2 


(75) 


2.  Full  Search  Quarter-Pel  Block  Matching  Algorithm  (FQBKM-ME)  —  FQBKM-ME  considers  all 
integer-pel,  half-pel  and  quarter-pel  displacements  within  the  search  area  in  finding  the  minimum 
MSE  value.  Precisely,  the  estimated  displacement  vector  is 

Em.nL |hM  -  *i(m  -u,n-  V )]2 


(Au,  A„)  =  arg 


mm 
uv=-£L-l.£L 

uiv  2  ’  4 ’  2 


N2 


(76) 


Therefore,  the  block  matching  approaches  should  provide  the  optimum  residuals  in  terms  of  MSE  values 
and  serve  here  as  a  benchmark  of  how  well  the  DCT-based  algorithms  perform.  It  should  be  noted  that 
all  half  and  quarter  pixel  values  for  FHBKM-ME  and  FQBKM-ME  are  approximated  by  the  bilinear 
interpolation.  However,  for  the  DCT-based  subpixel  algorithms,  no  interpolation  is  needed  in  finding 
the  motion  estimates.  Therefore,  the  number  of  operations  required  by  FHBKM-ME  and  FQBKM-ME 
are  twice  and  four  times  as  much  as  BKM-ME  respectively  whose  computational  complexity  is  0(N4) 
whereas  the  DCT-based  subpixel  algorithms  have  only  marginal  increase  in  computations  over  DXT-ME 
of  which  the  computational  complexity  is  0(N2).  In  the  following  simulation,  simple  edge  extraction 
and  frame  differentiation  are  adopted  for  preprocessing  input  images  before  the  DCT-based  algorithms, 
as  described  in  details  in  [18],  [19].  Either  preprocessing  scheme  adds  in  only  0{N2)  operations  as 
overhead,  keeping  the  total  complexity  remain  0(N2). 
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The  Small  Flower  Garden  sequence  has  99  frames  of  width  352  pixels  and  height  224,  of  which  the 
69th  Frame  is  shown  in  Fig.  10  (a).  This  sequence  is  preprocessed  either  by  edge  extractor  or  by  frame 
differentiator  as  shown  in  Fig.  10  (b)  and  (c)  respectively.  It  is  obvious  from  brightness  of  these  two 
subfigures  that  the  edge  extracted  frame  has  stronger  feature  energy  than  the  frame  difference.  As  a 
result,  it  is  expected  that  HDXT-ME  and  QDXT-ME  perform  better  on  edge  extracted  images  than 
frame  differentiated  images,  as  clearly  demonstrated  by  the  residual  images  in  Fig.  10  (e)  versus  (f) 
and  (h)  versus  (i).  In  Fig.  10  (f)  for  HDXT-ME  after  frame  difference,  there  are  slightly  more  blocks  of 
relatively  high  MSE  than  in  Fig.  10  (e)  for  HDXT-ME  after  edge  extraction.  Compared  to  the  residual 
image  from  FHBKM-ME  in  Fig.  10  (d),  the  residual  from  edge  extracted  HDXT-ME  has  similar  energy 
distribution  except  at  places  close  to  the  twigs  at  the  top  of  the  frame  in  Fig.  10  (e).  However,  at  the 
quarter-pel  level,  the  residuals  for  FQBKM-ME  and  QDXT-ME  after  either  edge  detection  or  frame 
difference  appear  similar  in  brightness  as  displayed  in  Fig.  10  (g)-(i).  In  all  the  residuals,  bright  strips 
at  the  right  indicate  fast  left  translational  motion  of  the  camera. 

The  simulation  results  on  the  HFG  sequence  are  plotted  in  Fig.  11  where  it  can  been  seen  that 
the  DCT-based  algorithms  perform  better  on  16  x  16  blocks  with  32  x  32  search  areas  than  on  8  x  8 
blocks  with  16  x  16  search  areas  in  Fig.  11  (a)-(b),  in  opposite  to  the  block  matching  algorithms.  The 
reason  is  that  a  larger  block  means  more  feature  energy  present  in  the  block  and  thus  it  results  in 
better  estimation  from  the  DCT-based  algorithms  which  try  to  match  energy  patterns  regardless  of  the 
shape  and  texture.  The  general  trend  of  performance  is  summarized  in  Fig.  11  (c)-(d)  where  the  MSE 
values  are  averaged  over  the  whole  sequence.  QDXT-ME  can  achieve  86%  reduction  on  average  over  the 
DIF  value,  the  MSE  value  without  motion  compensation,  as  compared  to  91%  for  FQBKM-ME  for  the 
block  size  16  x  16  but  the  computational  load  for  QDXT-ME  is  significantly  lower  than  FQBKM-ME. 
Furthermore,  all  the  graphs  show  that  QDXT-ME  has  a  very  close  performance  to  Q4DXT-ME  in  terms 
of  MSE  values. 

We  run  simulation  on  Frame  60  to  90  of  the  MS  sequence  whose  frame  size  is  352  x  288.  The  original 
frame  83  is  shown  in  Fig.  12  (a)  and  the  preprocessed  frames  in  Fig.  12  (b)-(c)  where  the  differentiated 
frame  contains  only  very  small  pixel  values  and  thus  need  be  displayed  after  visualization  process; 
otherwise,  its  contents  will  be  invisible.  These  small  DIF  values  indicate  only  slow  head  and  shoulder 
motion  in  this  sequence.  The  residual  images  for  various  methods  in  Fig.  12  (d)-(i)  reveal  that  edge 
extraction  is  better  than  frame  difference  due  to  weak  feature  energy  present  in  the  frame  differentiated 
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(a)  Frame  69 


(b)  Edge  Extracted  Frame  69 


(c)  Frame  Differentiated  Frame  69 


(g)  FQBKM-ME  (h)  QDXT-ME  after  edge  extraction  (i)  QDXT-ME  after  frame  differentiation 

Fig.  10.  Comparison  of  different  approaches  on  Frame  69  of  Small  Flower  Garden  sequence  (HFG)  for  block 
size  16  x  16  and  search  size  32  x  32.  Visualization  is  applied  to  (d)-(i)  by  setting  the  saturation  level  to 
35  due  to  small  pixel  values  in  these  residual  images.  The  needle  maps  for  the  estimated  motion  fields  are 
superimposed  on  the  residual  images. 


sequence.  Furthermore,  there  are  some  small  patches  in  the  clothes  areas  for  either  HDXT-ME  or 
QDXT-ME  in  view  of  the  uniform  brightness  in  these  areas  removed  by  both  preprocessing  functions. 
This  situation  may  be  improved  if  a  better  preprocessing  function  is  used  to  avoid  removal  of  uniform 
areas  while  suppressing  the  aperture  effect  [36].  As  can  be  seen  in  Fig.  13,  great  improvement  is  found 
for  HDXT-ME,  QDXT-ME  and  Q4DXT-ME  over  DXT-ME  for  the  MS  sequence,  particularly  after 
edge  extraction.  Due  to  the  dominant  slow  motion  in  the  sequence,  QDXT-ME  has  71.4%  reduction 
over  DIF  in  comparison  with  61.9%  for  the  integer-pel  DXT-ME  algorithm  and  Q4DXT-ME  has  even 
72.4%  gain.  However,  for  the  HFG  sequence,  the  gain  is  not  much  but  still  noticeable  for  the  subpixel 
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SMALL  FLOWER  GARDEN  (HFG) 

al  ter  edge  extraction 


SMALL  FLOWER  GARDEN  (HFG) 

idler  frame  differentiation 


(a)  Edge  Extracted 

Edge  Extracted  Small  Flower  Garden  (hfg) 


(b)  Frame  Differentiated 

Frame  Differentiated  Small  Flower  Garden  (hfg) 


(c)  Preprocessed  by  Edge  Extraction 

Fig.  11.  Simulation  Results  on  the  Small  Flower  Garden  sequence  (HFG) 


algorithms  over  DXT-ME  as  in  Fig.  11. 


VI.  Conclusion 

In  this  paper,  we  develop  the  DCT-based  subpixel  motion  estimation  techniques  based  on  the  subpel 
sinusoidal  orthogonal  principles  and  preservation  of  subpixel  motion  information  in  DOT  coefficients 
under  the  Nyquist  condition.  These  techniques  can  estimate  subpixel  motion  in  DOT  domain  without 
any  inter-pixel  interpolation  at  a  desired  level  of  accuracy.  Equally  applicable  to  other  areas  as  well, 
the  proposed  techniques  are  applied  to  video  coding  and  result  in  DCT-based  half-pel  and  quarter-pel 
motion  estimation  algorithms  (HDXT-ME,  QDXT-ME,  Q4DXT-ME)  which  estimate  motion  with  half- 
pel  or  quarter-pel  accuracy  without  interpolation  of  input  images.  This  results  in  significant  savings  in 
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(d)  FHBKM-ME 


(e)  HDXT-ME  after  edge  extraction  (f)  HDXT-ME  after  frame  differentiation 


50  100  150  200  250  300  350 


(g)  FQBKM-ME 


50  100  150  200  250 


50  100  ISO  200 


(h)  QDXT-ME  after  edge  extraction  (i)  QDXT-ME  after  frame  differentiation 


Fig.  12.  Comparison  of  different  approaches  on  Frame  83  of  Miss  America  sequence  (MS)  for  block  size  16  x  16 
and  search  size  32  x  32.  Visualization  is  applied  to  (c)-(i)  by  setting  the  saturation  level  to  35.  The  needle 
maps  for  the  estimated  motion  fields  are  laid  over  the  residual  images. 


computational  complexity  for  interpolation  and  far  less  data  flow  compared  to  the  conventional  block 
matching  methods  on  interpolated  images.  Also,  the  resulted  algorithms  are  more  suitable  for  VLSI 
implementation  [?,  ChiuCT:92,LiuKJR:94]  Furthermore,  it  avoids  the  deterioration  of  estimation  preci¬ 
sion  caused  by  interpolation  required  in  most  current  subpixel  motion  estimation  schemes.  In  addition, 
the  proposed  DCT-based  subpixel  motion  estimation  technique  and  the  resulted  algorithms  are  scalable 
in  the  sense  that  higher  estimation  accuracy  can  be  provided  easily  by  applying  the  same  subpel  sinu¬ 
soidal  orthogonal  principles  without  re-computing  pseudo  phases.  Therefore,  flexible  fully  DCT-based 
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Fig.  13.  Simulation  Results  on  the  Miss  America  sequence  (MS) 

codec  design  is  possible  because  the  same  hardware  can  support  different  levels  of  required  accuracy. 
Meanwhile,  the  computational  complexity  of  the  DCT-based  algorithms  is  only  0(N2)  compared  to 
0(Ni)  for  BKM-ME  or  its  subpixel  versions.  Finally,  HDXT-ME,  QDXT-ME  and  Q4DXT-ME  are 
DCT-based,  enabling  us  to  build  a  low-complexity  and  high-throughput  fully  DCT-based  video  coder. 

Appendix 

Equations  (51)-(53)  in  Section  III  are  derived  as  follows: 
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